Discovering Disentangled Representations with the F Statistic Loss
نویسندگان
چکیده
We propose and evaluate a novel loss function for discovering deep embeddings that make explicit the categorical and semantic structure of a domain. The loss function is based on the F statistic that describes the separation of two or more distributions. This loss has several key advantages over previous approaches, including: it does not require a margin or arbitrary parameters for determining when distributions are sufficiently well separated, it is expressed as a probability which facilitates its combination with other training objectives, and it seems particularly well suited to disentangling semantic features of a domain, leading to more interpretable and manipulable representations. In typical classification tasks, the input features—whether images, speech, text, or other measurements—contain only implicit information about category labels, and the job of a classifier is to transform the input features into a representation that makes category labels explicit. The traditional representation has been a localist or one-hot encoding of categories, but an alternative approach has recently emerged in which the representation is a distributed encoding in a high dimensional space that captures category structure via metric properties of the space. The middle panel of Figure 1 shows a projection of instances from three categories to a two-dimensional space. The projection separates inputs by category and therefore facilitates classification of unlabeled instances via proximity to the category clusters. Such a deep embedding also allows new categories to be ’learned’ with a few labeled examples that are projected to the embedding space. The literature is somewhat splintered between researchers focusing on deep embeddings which are evaluated via k-shot learning [1, 2, 3] and researchers focusing on k-shot learning who have found deep embeddings to be a useful method [4, 5]. Figure 1 illustrates a fundamental trade off in formulating an embedding. From left to right frames, the intra-class variability increases and the inter-class structure becomes more conspicuous. In the leftmost panel, the clusters are well separated but the classes are all equally far apart. In the rightmost panel, the clusters are highly overlapping and the blue and purple cluster centers are closer to one another than to the yellow. Separating clusters is desirable, but so is capturing inter-class similarity. If this similarity is suppressed, then instances of a novel class will not be mapped in a sensible manner—a manner sensitive to input features, semantic features, and their correspondence. The middle panel reflects a compromise between discarding variability between instances of the same class and preserving relationships among the classes. With this compromise, deep embeddings can Figure 1: Alternative two-dimensional embeddings of instances of three categories. Points represent instances and color the category label. In the leftmost frame, the points are superimposed on one another. be used to model hierarchical category structure and can facilitate partitioning the instances along multiple dimensions, e.g., disentangling content and style [6]. The trade off in Figure 1 points to a challenge in constructing deep embeddings. Some existing methods aim to perfectly separate categories in the training set [1], which may not be appropriate if there are labeling errors or noise in the data. Other methods require a margin or other parameter to determine how well separated the categories should be in order to prevent overfitting [7, 2, 3, 8]. We propose a new method that automatically balances the trade off using the currency of probability and statistical hypothesis testing. It also manages to align dimensions of the embedding space with categorical and semantic features, thereby facilitating the disentangling of representations. 1 Using the F statistic to separate classes For expository purposes, consider two classes, C = {1, 2}, having n1 and n2 instances, which are mapped to a one-dimensional embedding. The embedding coordinate of instance j of class i is denoted zij . The goal of any deep embedding procedure is to separate the coordinates of the two classes. In our approach, we will quantify the separation via the probability that the true class means in the underlying environment, μ1 and μ2, are different from one another. Our training goal can thus be formulated as minimizing Pr (μ1 = μ2 | s(z), n1, n2), where s(z) denotes summary statistics of the labeled embedding points. This posterior is intractable, so instead we operate on the likelihood Pr (s(z) | μ1 = μ2, n1, n2) as a proxy. We borrow a particular statistic from analysis of variance (ANOVA) hypothesis testing for equality of means. The statistic is a ratio of between-class variability to within-class variability: s = ñ ∑ i ni(z̄i − z̄) ∑ i,j(zij − z̄i) where z̄i = 〈zij〉 and z̄ = 〈z̄i〉 are expectations and ñ = n1 + n2 − 2. Under the null hypothesis μ1 = μ2 and an additional normality assumption, zij ∼ N (μ, σ), our statistic s is a draw from a Fisher-Snedecor (or F ) distribution with degrees of freedom 1 and ñ, S ∼ F1,ñ. Large s indicate that embeddings from the two different classes are well separated relative to two embeddings from the same class, which is unlikely under F1,ñ. Thus, the CDF of the F distribution offers a measure of the separation between classes: Pr (S < s|μ1 = μ2, ñ) = I ( s s+ ñ , 1
منابع مشابه
Forced to Learn: Discovering Disentangled Representations Without Exhaustive Labels
Learning a better representation with neural networks is a challenging problem, which was tackled extensively from different prospectives in the past few years. In this work, we focus on learning a representation that could be used for clustering and introduce a novel loss component that substantially improves the quality of produced clusters, is simple to apply to an arbitrary cost function, a...
متن کاملLearning Deep Disentangled Embeddings with the F-Statistic Loss
Deep-embedding methods aim to discover representations of a domain that make explicit the domain’s class structure. Disentangling methods aim to make explicit compositional or factorial structure. We combine these two active but independent lines of research and propose a new paradigm for discovering disentangled representations of class structure; these representations reveal the underlying fa...
متن کاملFew-shot Classification by Learning Disentangled Representations
Machine learning has improved state-of-the art performance in numerous domains, by using large amounts of data. In reality, labelled data is often not available for the task of interest. A fundamental problem of artificial intelligence is finding a representation that can generalize to never seen before classes. In this research, the power of generative models is combined with disentangled repr...
متن کاملUnsupervised Learning of Disentangled Representations from Video
We present a new model DRNET that learns disentangled image representations from video. Our approach leverages the temporal coherence of video and a novel adversarial loss to learn a representation that factorizes each frame into a stationary part and a temporally varying component. The disentangled representation can be used for a range of tasks. For example, applying a standard LSTM to the ti...
متن کاملClassification and Comparison of Methods for Discovering Coverage Loss Areas in Wireless Sensor Networks
In recent years, wireless sensor networks data is taken into consideration as an ideal source, in terms of speed, accuracy and cost, in order to study the Earth's surface. One of the most important challenges in this area, is the signaling network coverage and finding holes. In recent years, wireless sensor networks data is taken into consideration as an ideal source, in terms of speed, accurac...
متن کامل